Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation
نویسندگان
چکیده
Advanced modern processors support Single Instruction Multiple Data (SIMD) instructions (e.g. Intel-AVX, ARM-NEON) and a massive body of research on vector-parallel implementations of modular arithmetic, which are crucial components for modern public-key cryptography ranging from RSA, ElGamal, DSA and ECC, have been conducted. In this paper, we introduce a novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD architecture. The DOS technique partly doubles the operands and computes the squaring operation without ReadAfter-Write (RAW) dependencies between source and destination variables. Furthermore, we presented Karatsuba Cascade Operand Scanning (KCOS) multiplication and Karatsuba Double Operand Scanning (KDOS) squaring by adopting additive and subtractive Karatsuba’s methods, respectively. The proposed multiplication and squaring methods are compatible with separated Montgomery algorithms and these are highly efficient for RSA crypto system. Finally, our proposed multiplication/squaring, separated Montgomery multiplication/squaring and RSA encryption outperform the best-known results by 22/41%, 25/33% and 30% on the Cortex-A15 platform.
منابع مشابه
Montgomery Modular Multiplication on ARM-NEON Revisited
Montgomery modular multiplication constitutes the “arithmetic foundation” of modern public-key cryptography with applications ranging from RSA, DSA and Diffie-Hellman over elliptic curve schemes to pairing-based cryptosystems. The increased prevalence of SIMD-type instructions in commodity processors (e.g. Intel SSE, ARM NEON) has initiated a massive body of research on vector-parallel implemen...
متن کاملNEON PQCryto: Fast and Parallel Ring-LWE Encryption on ARM NEON Architecture
Recently, ARM NEON architecture has occupied a significant share of tablet and smartphone markets due to its low cost and high performance. This paper studies efficient techniques of lattice-based cryptography on ARM processor and presents the first implementation of ring-LWE encryption on ARM NEON architecture. In particular, we propose a vectorized version of Iterative Number Theoretic Transf...
متن کاملThe Chinese Remainder Theorem and its Application in a High-Speed RSA Crypto Chip
The performance of RSA hardware is primarily determined by an efficient implementation of the long integer modular arithmetic and the ability to utilize the Chinese Remainder Theorem (CRT) for the private key operations. This paper presents the multiplier architecture of the RSA crypto chip, a high-speed hardware accelerator for long integer modular arithmetic. The RSA multiplier datapath is re...
متن کاملEfficient and Side-channel Resistant RSA Implementation For 8-bit AVR Microcontrollers
The RSA algorithm is the most widely used publickey cryptosystem today, but difficult to implement on embedded devices due to the computation-intense nature of its underlying arithmetic operations. Different techniques for efficient software implementation of the RSA algorithm have been proposed; these range from high-level approaches, such as exploiting the Chinese Remainder Theorem (CRT), dow...
متن کاملModified 32-Bit Shift-Add Multiplier Design for Low Power Application
Multiplication is a basic operation in any signal processing application. Multiplication is the most important one among the four arithmetic operations like addition, subtraction, and division. Multipliers are usually hardware intensive, and the main parameters of concern are high speed, low cost, and less VLSI area. The propagation time and power consumption in the multiplier are always high. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Security and Communication Networks
دوره 9 شماره
صفحات -
تاریخ انتشار 2015